In 2020, COVID-19 has become a serious pandemic and influenced millions of people’s lives. As of December 5, 2020, there are about 14.3 million confirmed cases of the COVID-19 in the United States and 278,000 confirmed deaths as a result of it. As the number of confirmed cases and deaths show an increasing trend since the beginning of 2020, it is crucial to attempt to identify methods for combating the disease. There has been much discourse surrounding the importance of wearing masks to curb the rate of spread of COVID-19, and we explore this relationship through rigorous statistical tests and inference. Because states with larger overall populations will likely have more cases, the proportion of confirmed cases in each state will be used as the response variable to describe the prevalence of COVID-19 in a particular state.
We want to know whether the presence of a state-wide mask mandate is correlated with the overall number of COVID-19 cases in the state. We hypothesize that a state-wide mask mandate is correlated with a lower number of overall cases.
Our data is publicly provided by JSU CSSE, who is unaffiliated with this analysis. The original data was in the form of CSV files, each one containing daily observational data for \(60\) states and provinces in the United States ranging from April \(4, 2020\) to the present day.
We read and formatted the data using the Pandas Python library (McKinney (2010)). We first combined all CSV data into a single dataframe, keeping only the \(50\) states and observations before October \(23\). This resulted in \(11500\) rows and \(14\) columns. The column data included attributes such as state name, timestamps, confirmed cases, and deaths. Because attributes such as confirmed cases and deaths are time series cumulative, we then differenced them by subtracting the current value from that of the previous day and setting the initial value at \(0\). Finally, we divided numeric attributes by the state’s population.
The mask mandate variable was hard coded, where \(1\) denotes the presence of a mandate and \(0\) denotes the lack of one. This data was provided by Axios (Fernandez (2020)).
We utilized an ordinary linear regression(OLS) model to investigate the significance of the mask mandate variable when predicting a state’s proportion of confirmed cases and proportion of daily new cases. The cumulative proportion of cases is shown to be approximately normal through a Q-Q plot, and the proportion of daily new cases required a log transformation.
Figure 1: The distribution of states with and without mask mandates.
Figure 2: The distribution of cumulative percentage of confirmed cases.
Setting the proportion of confirmed cases as the independent variable and mask mandate as the dependent variable, we determined that the mask mandate variable was significant(\(p < 0.01\)). The slope of the regression line was \(-1.79\), indicating that the presence of a mask mandate led to a decrease in the predicted proportion of a state’s confirmed cases of around \(1.8\%\), supporting our hypothesis that the presence of a mask mandate is an effective method of combating COVID-19. Diagnostic plots showed that the data was approximately normal, and no assumptions were blatantly violated.
Next, we considered the number of daily new cases per state for a total of \(11500\) data points. Because this attribute was heavily skewed, we performed a log transformation, excluding any non-positive rows for a total of \(11087\) remaining rows. A linear regression model found that the mask variable was highly significant(\(p < 0.001\)) with a reverse-transformed slope of \(0.72\). This can be interpreted as a 28% reduction in a state’s proportion of daily new cases if it has a state-wide mask mandate. Again, diagnostic plots did not show any blatantly violated assumptions.
Figure 3: The distribution of daily log proportion of confirmed cases across all states.
Through the analysis using ordinary linear regression model to investigate the significance of the mask mandate attribute when predicting a state’s proportion of confirmed cases, we deemed the mask mandate to be a significant predictor. While we cannot conclude that the presence of a mask mandate is a direct cause of lower confirmed cases in a state, it is certainly highly correlated.
Fernandez, Maria Arias Marisa. 2020. “The States Where Face Coverings Are Mandatory.” https://www.axios.com/states-face-coverings-mandatory-a0e2fe35-5b7b-458e-9d28-3f6cdb1032fb.html.
McKinney. 2010. “Data Structures for Statistical Computing in Python.” In Proceedings of the 9th Python in Science Conference, edited by Stéfan van der Walt and Jarrod Millman, 56–61. https://doi.org/ 10.25080/Majora-92bf1922-00a.